Goto

Collaborating Authors

 queue length


Finite-Time Minimax Bounds and an Optimal Lyapunov Policy in Queueing Control

Liu, Yujie, Tan, Vincent Y. F., Xu, Yunbei

arXiv.org Artificial Intelligence

We introduce an original minimax framework for finite-time performance analysis in queueing control and propose a surprisingly simple Lyapunov-based scheduling policy with superior finite-time performance. The framework quantitatively characterizes how the expected total queue length scales with key system parameters, including the capacity of the scheduling set and the variability of arrivals and departures across queues. To our knowledge, this provides the first firm foundation for evaluating and comparing scheduling policies in the finite-time regime, including nonstationary settings, and shows that the proposed policy can provably and empirically outperform classical MaxWeight in finite time. Within this framework, we establish three main sets of results. First, we derive minimax lower bounds on the expected total queue length for parallel-queue scheduling via a novel Brownian coupling argument. Second, we propose a new policy, LyapOpt, which minimizes the full quadratic Lyapunov drift-capturing both first- and second-order terms-and achieves optimal finite-time performance in heavy traffic while retaining classical stability guarantees. Third, we identify a key limitation of the classical MaxWeight policy, which optimizes only the first-order drift: its finite-time performance depends suboptimally on system parameters, leading to substantially larger backlogs in explicitly characterized settings. Together, these results delineate the scope and limitations of classical drift-based scheduling and motivate new queueing-control methods with rigorous finite-time guarantees.





Knowledge vs. Experience: Asymptotic Limits of Impatience in Edge Tenants

Kiggundu, Anthony, Han, Bin, Schotten, Hans D.

arXiv.org Machine Learning

We study how two information feeds, a closed-form Markov estimator of residual sojourn and an online trained actor-critic, affect reneging and jockeying in a dual M/M/1 system. Analytically, for unequal service rates and total-time patience, we show that total wait grows linearly so abandonment is inevitable and the probability of a successful jockey vanishes as the backlog approaches towards infinity. Furthermore, under a mild sub-linear error condition both information models yield the same asymptotic limits (robustness). We empirically validate these limits and quantify finite backlog differences. Our findings show that learned and analytic feeds produce different delays, reneging rates and transient jockeying behavior at practical sizes, but converge to the same asymptotic outcome implied by our theory. The results characterize when value-of-information matters (finite regimes) and when it does not (asymptotics), informing lightweight telemetry and decision-logic design for low-cost, jockeying-aware systems.


Convergence of Multiagent Learning Systems for Traffic control

Sen, Sayambhu, Bhatnagar, Shalabh

arXiv.org Artificial Intelligence

Rapid urbanization in cities like Bangalore has led to severe traffic congestion, making efficient Traffic Signal Control (TSC) essential. Multi-Agent Reinforcement Learning (MARL), often modeling each traffic signal as an independent agent using Q-learning, has emerged as a promising strategy to reduce average commuter delays. While prior work Prashant L A et. al has empirically demonstrated the effectiveness of this approach, a rigorous theoretical analysis of its stability and convergence properties in the context of traffic control has not been explored. This paper bridges that gap by focusing squarely on the theoretical basis of this multi-agent algorithm. We investigate the convergence problem inherent in using independent learners for the cooperative TSC task. Utilizing stochastic approximation methods, we formally analyze the learning dynamics. The primary contribution of this work is the proof that the specific multi-agent reinforcement learning algorithm for traffic control is proven to converge under the given conditions extending it from single agent convergence proofs for asynchronous value iteration.


Quantifying the Cost of Learning in Queueing Systems

Neural Information Processing Systems

Queueing systems are widely applicable stochastic models with use cases in communication networks, healthcare, service systems, etc. Although their optimal control has been extensively studied, most existing approaches assume perfect knowledge of the system parameters.



A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control

Guo, Qing, Li, Xinhang, Chen, Junyu, Guo, Zheng, Li, Xiaocong, Zhang, Lin, Li, Lei

arXiv.org Artificial Intelligence

Leveraging large language models (LLMs) in traffic signal control (TSC) improves optimization efficiency and interpretability compared to traditional reinforcement learning (RL) methods. However, existing LLM-based approaches are limited by fixed time signal durations and are prone to hallucination errors, while RL methods lack robustness in signal timing decisions and suffer from poor generalization. To address these challenges, this paper proposes HeraldLight, a dual LLMs architecture enhanced by Herald guided prompts. The Herald Module extracts contextual information and forecasts queue lengths for each traffic phase based on real-time conditions. The first LLM, LLM-Agent, uses these forecasts to make fine grained traffic signal control, while the second LLM, LLM-Critic, refines LLM-Agent's outputs, correcting errors and hallucinations. These refined outputs are used for score-based fine-tuning to improve accuracy and robustness. Simulation experiments using CityFlow on real world datasets covering 224 intersections in Jinan (12), Hangzhou (16), and New York (196) demonstrate that HeraldLight outperforms state of the art baselines, achieving a 20.03% reduction in average travel time across all scenarios and a 10.74% reduction in average queue length on the Jinan and Hangzhou scenarios. The source code is available on GitHub: https://github.com/BUPT-ANTlab/HeraldLight.


Near-Optimal Regret-Queue Length Tradeoff in Online Learning for Two-Sided Markets

Yang, Zixian, Varma, Sushil Mahavir, Ying, Lei

arXiv.org Artificial Intelligence

We study a two-sided market, wherein, price-sensitive heterogeneous customers and servers arrive and join their respective queues. A compatible customer-server pair can then be matched by the platform, at which point, they leave the system. Our objective is to design pricing and matching algorithms that maximize the platform's profit, while maintaining reasonable queue lengths. As the demand and supply curves governing the price-dependent arrival rates may not be known in practice, we design a novel online-learning-based pricing policy and establish its near-optimality. In particular, we prove a tradeoff among three performance metrics: $\tilde{O}(T^{1-γ})$ regret, $\tilde{O}(T^{γ/2})$ average queue length, and $\tilde{O}(T^γ)$ maximum queue length for $γ\in (0, 1/6]$, significantly improving over existing results [1]. Moreover, barring the permissible range of $γ$, we show that this trade-off between regret and average queue length is optimal up to logarithmic factors under a class of policies, matching the optimal one as in [2] which assumes the demand and supply curves to be known. Our proposed policy has two noteworthy features: a dynamic component that optimizes the tradeoff between low regret and small queue lengths; and a probabilistic component that resolves the tension between obtaining useful samples for fast learning and maintaining small queue lengths.